Woylier: Alternative tour frame interpolation method

The woylier package implements alternative method for interpolation path between tour frames using Givens rotation.

Zoljargal Batsaikhan https://www.britannica.com/animal/quokka (Monash University) , Dianne Cook https://www.britannica.com/animal/bilby (Monash University) , Ursula Laa https://www.britannica.com/animal/bilby (BOKU University)
2022-10-10

Introduction

The idea of projection pursuit is a procedure used to locate the projection of high-to-low dimensional space that expose the most interesting feature of data originally proposed by Kruskal(Kruskal 1969). The projection pursuit technique involves a criterion of interest, numerical objective function, and the most interesting projection of data is achieved when the criterion is maximized. In the literature, there are a number of such criterion has been developed based on clustering, spread, and outliers.

Grand tour is a multivariate data visualization technique developed by Asimov(Asimov 1985), which is based on idea of rotations of a lower dimensional projection in high-dimensional space. This technique is equivalent to rotating an object in 3D to better understand its shape and dimensions. Originally, Asimov’s grand tour presents the viewer with an automatic movie of projections with no user control. Since then the literature on the tour has been about interactivity of the tour giving the control to users.(Buja et al. 2005) One of the interactive visual data exploration tool is “guided tour”.

The guided tour gives user an extrapolation power by combining projection pursuit with tour which is implemented in the “tourr”(Wickham et al. 2011) package. Current implementation of guided tour used geodesic interpolation between planes.

The interpolated paths based on geodesic interpolation between bases is visually invariant under changes of orientation. However, in some cases of non-linear projection pursuit the orientation of frames does matter. One example is splines2D index.(Laa and Cook 2020) The lack of rotation invariance of splines2d index raise complications in the optimization process.(Laa and Cook 2020) This rotational variety issue of non-linear projection pursuit functions was the motivation of this work.

A few alternatives to geodesic interpolation were proposed by Buja et al(Buja et al. 2005). The purpose of woylier package is to implement Givens paths method. This algorithms adapts Given’s matrix decomposition technique that zeros out all but first elements of a vector.

This article is structured as follows. The next section provided the theoretical framework of Givens interpolation method followed by a section about the implementation Givens path in woylier package that is compatible with current geodesic_path() function of tourr package. Furthermore, we would attempt to apply this interpolation method to projection pursuit of splines index to search for nonlinear associations between variables in example data set. Finally, this article includes a discussion about the limitations.

Background

The tour method of visualization is animated high-to-low dimensional data rotation that is a movie, one-parameter (time) family of static projections. Several algorithms for such dynamic projections are discussed by Buja et al(Buja et al. 2005) is based on the idea of smoothly interpolating a discrete sequence of projections.

The topic of this article is the construction of paths of projections. Interpolation of paths of projection can be compared to connecting line segments that interpolate points in Euclidean space. Interpolation acts as a bridge between continuous animation and discrete choice of sequences of projections. Sequence of projections can be constructed in various ways depending on user purpose. If user wants to look at the data from all sides, a random sequence of projections can be used, which is implemented in grand tours. Furthermore, the sequence of projections can be pre-computed, data-driven, or even manually controlled.

Before continuing with the interpolation algorithms, we need to define the notations.

\[F^TF = I_d\]

Preprojection algorithm

The purpose of preprojection is to limit the data subspace \(F_a\) and \(F_z\) that the interpolated path is traversing so that the interpolation is carried out simply. Simply, it is defining the joint subspace of \(F_a\) and \(F_z\).

The procedure starts with forming an arbitrary orthonormal basis by applying Gram-Schmidt to \(F_z\) with regard to \(F_a\). We denote the resulting orthonormal basis of \(p\times d_s\) by \(B\).

Now, we can express the original frames in this basis:

\[F_a = BW_a, F_z = BW_z\]

The interpolation problem is then reduced to the construction of paths of frames \(W(t)\) that interpolate the projected frames \(W_a\) and \(W_z\).

The interpolating paths of plane versus frames

Current implementation of tourr package(Wickham et al. 2011) is locally shortest (geodesic) interpolation of planes. The pitfall of this interpolation method is that it does not account for rotation variability. However, the interpolation of frames is required when the orientation of projection matters. If the rendering on a frame and on the rotated version of the frame yields the same visual scenes, it means the orientation does not matter.

The orientation of frames could be important when non-linear projection pursuit function is used in guided tour. An illustration of such cases are shown below.

Givens rotation

A rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space in xy plane. A rotation matrix that transforms 2-D plane by an angle \(\theta\) looks like this:

\[ \begin{bmatrix}\cos \theta &-\sin \theta \\\sin \theta &\cos \theta \end{bmatrix} \]

If the rotation is in the plane of variables i and j, it is called Givens rotation. The Givens is useful for introducing zeros on a grand scale and used for computing the QR decomposition of matrix in linear algebra problems. One advantage over other transformation methods which is particularly useful in our case is the ability to zero elements more selectively.

The interpolation methods in the woylier package is based on the fact that in any vector u one can zero out the i-th coordinate with a Givens rotation in the \((i, j)\)-plane for any \(j\neq i\). This rotation affects only coordinate \(i\) and \(j\) and leave all other coordinates unchanged.

Givens interpolation path algorithm

We established that the between frames interpolation is necessary when the orientation of the projection matters. There are several methods discussed in the paper including decomposition of orthogonal matrices, Givens decomposition and Householder decomposition. One that isof interest to us is Givens path.

Sequences of Givens rotations can map any orthonormal d-frame F in p-space to standard d-frame \(E_d=((1, 0, 0, ...)^T, (0, 1, 0, ...)^T, ...)\).

The interpolation path construction algorithm from starting frame \(F_a\) to target frame \(F_z\) is illustrated below. The example is 2D path construction process of original 6D data frame.

  1. Construct preprojection basis \(B\) by orthonormalizing \(F_z\) with regards tp \(F_a\) with Gram-Schmidt:

\[B = (F_a, F_{\star})\]

\(F_a\) and \(F_z\) are \(p\times d\) or \(6\times2\) matrices that are orthonormal. The preprojection basis \(B\) is \(p\times 2d\) matrix that is \(6\times 4\) in our example.

  1. Get the preprojected frames using the preprojection basis \(B\). \[W_a = B^TF_a = E_d\] and \[W_z = B^TF_z\] Since \(B\) is orthonormalized basis of \(F_z\) with regard to \(F_a\), \(W_a\) is \(2d\times d\) matrix of 1, 0s by construction that looks like in our example:

\[ \begin{bmatrix}1 & 0 \\0 &1 \\ 0&0 \\0&0\end{bmatrix} \]

\(W_z\) is orthonormal \(2d\times d\) matrix that looks like:

\[ \begin{bmatrix} a_{11} & a_{12} \\a_{21} &a_{22} \\ a_{31}&a_{32} \\a_{41}&a_{42}\end{bmatrix} \]

  1. Then, we can construct a sequence of Givens rotations that maps \(W_z\) to \(W_a\):

\[ W_a = R_m(\theta_m) ... R_2(\theta_2)R_1(\theta_1)W_z\]

At each rotation, the angle \(\theta_i\) that zero out the second coordinate of a plane is calculated.

When \(d = 2\), there are 5 rotations involved with 5 different angles that makes each elements 0. For example, the first rotation angle \(\theta_1\) is an angle between \((1, 0)\) and \((a_{11}, a_{21})\). This rotation matrix would make element \(a_{21}\) zero:

\[R_1(\theta_1) = G(1, 2, \theta_1) = \begin{bmatrix} cos\theta_1 & -sin\theta_1 & 0 & 0 \\sin\theta_1 &cos\theta_1 & 0 &0 \\ 0&0&1&0 \\0&0&0&1\end{bmatrix}\]

  1. The inverse mapping is obtained by reversing the sequence of rotations with the negative of the angles, we starts from the starting basis and end at the target basis.

\[R(\tau) = R_1(-\tau_1) ... R_m(-\tau_m), \ W_z = R(\tau)W_a\] This step should include the time parameter, \(t\), so it shows the interpolation process rendered in the movie-like sequence.

  1. Finally, we reconstruct our original frame using \(B\). This reconstruction is done at each step of interpolation so we have interpolated path as result.

\[F_t = B * W_t\]

Projection pursuit index functions

The properties of several projection pursuit index functions were investigated.(Laa and Cook 2020) The smoothness, squintability, flexibility, rotation invariance, and speed of projection pursuit index functions were examined. The one property that is interesting to us is rotation invariance. The rotational invariance is examined by computing projection pursuit index fro different rotations within 2D plane. It is established that the dcor2d, splines2d and TIC index are not rotationally invariant.

Splines2D index measures nonlinear association between variable by fitting spline model.(Laa and Cook 2020) The functional dependence is stronger if the index value is larger.

Sphere and torus

In order to illustrate the interpolated path of projections we used geozoo(Schloerke 2016) package. 1D basis is plotted on unit sphere, while 2D basis is visualized on torus. The points on the surface of sphere and torus shape are randomly generated by functions from the geozoo(Schloerke 2016) package.

Introduction

Interactive data graphics provides plots that allow users to interact them. One of the most basic types of interaction is through tooltips, where users are provided additional information about elements in the plot by moving the cursor over the plot.

This paper will first review some R packages on interactive graphics and their tooltip implementations. A new package ToOoOlTiPs that provides customized tooltips for plot, is introduced. Some example plots will then be given to showcase how these tooltips help users to better read the graphics.

Background

Some packages on interactive graphics include plotly (Sievert 2020) that interfaces with Javascript for web-based interactive graphics, crosstalk (Cheng and Sievert 2021) that specializes cross-linking elements across individual graphics. The recent R Journal paper tsibbletalk (Wang and Cook 2021) provides a good example of including interactive graphics into an article for the journal. It has both a set of linked plots, and also an animated gif example, illustrating linking between time series plots and feature summaries.

Customizing tooltip design with ToOoOlTiPs

ToOoOlTiPs is a packages for customizing tooltips in interactive graphics, it features these possibilities.

A gallery of tooltips examples

The palmerpenguins data (Horst et al. 2020) features three penguin species which has a lovely illustration by Alison Horst in Figure 1.

A picture of three different penguins with their species: Chinstrap, Gentoo, and Adelie.

Figure 1: Artwork by @allison_horst

Table 1 prints at the first few rows of the penguins data:

Table 1: A basic table
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007

Figure 2 shows an interactive plot of the penguins data, made using the plotly package.

p <- penguins %>% 
  ggplot(aes(x = bill_depth_mm, y = bill_length_mm, 
             color = species)) + 
  geom_point()
ggplotly(p)

Figure 2: A basic interactive plot made with the plotly package on palmer penguin data. Three species of penguins are plotted with bill depth on the x-axis and bill length on the y-axis. When hovering on a point, a tooltip will show the exact value of the bill depth and length for that point, along with the species name.

Summary

We have displayed various tooltips that are available in the package ToOoOlTiPs.

CRAN packages used

ToOoOlTiPs, plotly, crosstalk, tsibbletalk, palmerpenguins, ggplot2

CRAN Task Views implied by cited packages

Spatial, TeachingStatistics, TimeSeries, WebTechnologies

D. Asimov. The grand tour: A tool for viewing multidimensional data. SIAM J Sci Stat Comput 6(1), 128–143, 1985.
A. Buja, D. Cook, D. Asimov and C. Hurley. Computational methods for high-dimensional rotations in data visualization. Handbook of Statistics, 391–413, 2005. DOI 10.1016/s0169-7161(04)24014-7.
J. Cheng and C. Sievert. crosstalk: Inter-widget interactivity for HTML widgets. 2021. URL https://CRAN.R-project.org/package=crosstalk. R package version 1.1.1.
A. M. Horst, A. P. Hill and K. B. Gorman. palmerpenguins: Palmer archipelago (antarctica) penguin data. 2020. URL https://allisonhorst.github.io/palmerpenguins/. R package version 0.1.0.
JB. Kruskal. Toward a practical method which helps uncover the structure of a set of observations by finding the line transformation which optimizes a new “index of condensation. Statistical computation; New York, Academic Press, 427–440, 1969.
U. Laa and D. Cook. Using tours to visually investigate properties of new projection pursuit indexes with application to problems in physics. Comput Stat 35, 1171–1205, 2020. DOI https://doi.org/10.1007/s00180-020-00954-8.
B. Schloerke. Geozoo: Zoo of geometric objects. 2016. URL https://CRAN.R-project.org/package=geozoo. R package version 0.5.1.
C. Sievert. Interactive Web-Based Data Visualization with r, plotly, and shiny. Chapman; Hall/CRC, 2020. URL https://plotly-r.com.
E. Wang and D. Cook. Conversations in time: Interactive visualisation to explore structured temporal data. The R Journal, 2021. URL https://journal.r-project.org/archive/2021/RJ-2021-050/index.html.
H. Wickham, D. Cook, H. Hofmann and A. Buja. tourr: An R package for exploring multivariate data with projections. Journal of Statistical Software, 40(2): 1–18, 2011. URL https://doi.org/10.18637/jss.v040.i02.

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Batsaikhan, et al., "Woylier: Alternative tour frame interpolation method", The R Journal, 2022

BibTeX citation

@article{woylier_article,
  author = {Batsaikhan, Zoljargal and Cook, Dianne and Laa, Ursula},
  title = {Woylier: Alternative tour frame interpolation method},
  journal = {The R Journal},
  year = {2022},
  note = {https://doi.org/10.32614/woylier_article},
  doi = {10.32614/woylier_article},
  issn = {2073-4859},
  pages = {1}
}